传统上,将情感建模视为映射可测量的影响表现的过程,这些过程来自用户输入的多种方式,以影响标签。该映射通常是通过机器学习过程来推断的。如果相反,一个人训练一般的主题不变表示,考虑影响信息,然后使用此类表示形式来建模?在本文中,我们假设影响标签构成了情感表示形式的组成部分,而不仅仅是训练信号,我们探讨了如何采用对比度学习的最新范式来发现目的的一般高级感动式的表示形式建模影响。我们介绍了三种不同的监督对比学习方法,用于考虑影响信息的培训表示。在这项最初的研究中,我们根据来自多种模式的用户信息来测试Recola数据集中唤醒预测的建议方法。结果证明了对比度学习的表示能力及其在提高情感模型准确性方面的效率。除了与端到端的唤醒分类相比,其证据更高的性能之外,最终的表示是通用和主题不合时式的,因为训练受到了任何多模式语料库可用的一般影响信息的指导。
translated by 谷歌翻译
在文化遗产中,高光谱图像通常使用,因为它们提供了有关材料光学特性的扩展信息。因此,从要应用的机器学习技术的角度来看,这种高维数据的处理变得具有挑战性。在本文中,我们提出了一种基于排名的基于张量的学习模型,以识别和对文化遗产纪念碑的物质缺陷进行分类。与常规的深度学习方法相反,拟议的高阶基于张量的学习表明,具有更高的准确性和鲁棒性,以防止过度拟合。来自联合国教科文组织保护区的现实世界数据的实验结果表明,与常规深度学习模型相比,该计划的优越性。
translated by 谷歌翻译
对于任何游戏人工智能任务,包括游戏玩法,测试,玩家建模和程序内容生成,访问准确的游戏状态信息至关重要。自我监督的学习(SSL)技术已证明能够从游戏的高维像素输入到压缩潜在表示中从高维的像素输入中推断出准确的游戏状态信息。对比度学习是SSL的流行范式之一,其中对游戏图像的视觉理解来自与简单图像增强方法定义的不同和类似的游戏状态。在这项研究中,我们介绍了一种新的游戏场景增强技术(名为GameClr),该技术利用游戏引擎来定义和综合不同游戏状态的特定,高度控制的效果图,从而提高了对比性学习表现。我们在Carla驱动模拟器环境的图像上测试了GAMECLR对比度学习技术,并将其与流行的SIMCLR基线SSL方法进行比较。我们的结果表明,与基线相比,GAMECLR可以更准确地从游戏录像中推断游戏的状态信息。引入的方法使我们能够通过直接利用屏幕像素作为输入来进行游戏人工智能研究。
translated by 谷歌翻译
归一化是任何机器学习任务的重要过程,因为它控制数据的属性并影响了整个模型性能。然而,迄今为止,特定形式的正常化形式的影响已在有限的特定领域分类任务中,而不是以一般方式进行了研究。由于缺乏这样的全面研究的激励,我们在本文中调查了LP受限的软性损失分类器在不同的规范订单,幅度和数据维度上的性能在概念证明分类问题和现实世界中流行图像分类中的性能任务。实验结果总共表明,LP受限的软磁损耗分类器不仅可以实现更准确的分类结果,而且同时似乎不太容易过度拟合。在测试的三个流行深度学习体系结构和八个数据集中,核心发现持续存在,并建议LP归一化是在性能和​​收敛性方面的图像分类的推荐数据表示实践,并且反对过度拟合。
translated by 谷歌翻译
自我监督学习(SSL)技术已被广泛用于从高维复杂数据中学习紧凑而有益的表示。在许多计算机视觉任务(例如图像分类)中,此类方法获得了超过监督学习方法的最新结果。在本文中,我们研究是否可以利用SSL方法来学习游戏的准确状态表示的任务,如果是的,则在多大程度上。为此,我们从三个不同的3D游戏中收集游戏镜头和游戏内部状态的相应序列:Vizdoom,Carla Racing Simulator和Google Research Football Eniversion。我们仅使用原始帧训练图像编码器,使用三种广泛使用的SSL算法训练图像编码器,然后尝试从学习的表示形式中恢复内部状态变量。与预训练的基线模型(例如ImageNet)相比,我们在所有三场游戏中的结果都显示了SSL表示与游戏内部状态之间的相关性明显更高。这样的发现表明,基于SSL的视觉编码器可以产生一般的一般 - 不是针对特定任务量身定制的 - 但仅从游戏像素信息中提供了信息丰富的游戏表示。这样的表示反过来可以构成增强游戏中下游学习任务的性能,包括游戏玩法,内容生成和玩家建模。
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译
This paper deals with the problem of statistical and system heterogeneity in a cross-silo Federated Learning (FL) framework where there exist a limited number of Consumer Internet of Things (CIoT) devices in a smart building. We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed ``G-Fedfilt''. The proposed aggregator enables a structured flow of information based on the graph's topology. This behavior allows capturing the interconnection of CIoT devices and training domain-specific models. The embedded graph filter is equipped with a tunable parameter which enables a continuous trade-off between domain-agnostic and domain-specific FL. In the case of domain-agnostic, it forces G-Fedfilt to act similar to the conventional Federated Averaging (FedAvg) aggregation rule. The proposed G-Fedfilt also enables an intrinsic smooth clustering based on the graph connectivity without explicitly specified which further boosts the personalization of the models in the framework. In addition, the proposed scheme enjoys a communication-efficient time-scheduling to alleviate the system heterogeneity. This is accomplished by adaptively adjusting the amount of training data samples and sparsity of the models' gradients to reduce communication desynchronization and latency. Simulation results show that the proposed G-Fedfilt achieves up to $3.99\% $ better classification accuracy than the conventional FedAvg when concerning model personalization on the statistically heterogeneous local datasets, while it is capable of yielding up to $2.41\%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
translated by 谷歌翻译
Mapping the seafloor with underwater imaging cameras is of significant importance for various applications including marine engineering, geology, geomorphology, archaeology and biology. For shallow waters, among the underwater imaging challenges, caustics i.e., the complex physical phenomena resulting from the projection of light rays being refracted by the wavy surface, is likely the most crucial one. Caustics is the main factor during underwater imaging campaigns that massively degrade image quality and affect severely any 2D mosaicking or 3D reconstruction of the seabed. In this work, we propose a novel method for correcting the radiometric effects of caustics on shallow underwater imagery. Contrary to the state-of-the-art, the developed method can handle seabed and riverbed of any anaglyph, correcting the images using real pixel information, thus, improving image matching and 3D reconstruction processes. In particular, the developed method employs deep learning architectures in order to classify image pixels to "non-caustics" and "caustics". Then, exploits the 3D geometry of the scene to achieve a pixel-wise correction, by transferring appropriate color values between the overlapping underwater images. Moreover, to fill the current gap, we have collected, annotated and structured a real-world caustic dataset, namely R-CAUSTIC, which is openly available. Overall, based on the experimental results and validation the developed methodology is quite promising in both detecting caustics and reconstructing their intensity.
translated by 谷歌翻译
360-degree panoramic videos have gained considerable attention in recent years due to the rapid development of head-mounted displays (HMDs) and panoramic cameras. One major problem in streaming panoramic videos is that panoramic videos are much larger in size compared to traditional ones. Moreover, the user devices are often in a wireless environment, with limited battery, computation power, and bandwidth. To reduce resource consumption, researchers have proposed ways to predict the users' viewports so that only part of the entire video needs to be transmitted from the server. However, the robustness of such prediction approaches has been overlooked in the literature: it is usually assumed that only a few models, pre-trained on past users' experiences, are applied for prediction to all users. We observe that those pre-trained models can perform poorly for some users because they might have drastically different behaviors from the majority, and the pre-trained models cannot capture the features in unseen videos. In this work, we propose a novel meta learning based viewport prediction paradigm to alleviate the worst prediction performance and ensure the robustness of viewport prediction. This paradigm uses two machine learning models, where the first model predicts the viewing direction, and the second model predicts the minimum video prefetch size that can include the actual viewport. We first train two meta models so that they are sensitive to new training data, and then quickly adapt them to users while they are watching the videos. Evaluation results reveal that the meta models can adapt quickly to each user, and can significantly increase the prediction accuracy, especially for the worst-performing predictions.
translated by 谷歌翻译
Background samples provide key contextual information for segmenting regions of interest (ROIs). However, they always cover a diverse set of structures, causing difficulties for the segmentation model to learn good decision boundaries with high sensitivity and precision. The issue concerns the highly heterogeneous nature of the background class, resulting in multi-modal distributions. Empirically, we find that neural networks trained with heterogeneous background struggle to map the corresponding contextual samples to compact clusters in feature space. As a result, the distribution over background logit activations may shift across the decision boundary, leading to systematic over-segmentation across different datasets and tasks. In this study, we propose context label learning (CoLab) to improve the context representations by decomposing the background class into several subclasses. Specifically, we train an auxiliary network as a task generator, along with the primary segmentation model, to automatically generate context labels that positively affect the ROI segmentation accuracy. Extensive experiments are conducted on several challenging segmentation tasks and datasets. The results demonstrate that CoLab can guide the segmentation model to map the logits of background samples away from the decision boundary, resulting in significantly improved segmentation accuracy. Code is available.
translated by 谷歌翻译